Land cover classification at the Mississppi Delta¶
Background Information¶
This study will use the harmonized Sentinal/Landsat multispectral dataset to look at patterns in vegetation data. The HUC region 8 watershed extends from Missouri to the Gulf of Mexico and the lower extent near New Orleans is the focus for this analysis. The EPA ecoregion designation is Mississippi Alluvial and SE Coastal Plains. According to a publication by the Louisiana Geological Survey, the area is comprised of "...a diversity of grasses, sedges, and rushes." However, there has been a tremendous amount of human engineering of this environment.¶
Sources¶
- USDA (2012), Response to RFI for Long-Term Agro-ecosystem Research (LTAR) Network, available online at: https://www.ars.usda.gov/ARSUserFiles/np211/LMRBProposal.pdf.
- John Snead, Richard P. McCulloh, and Paul V. Heinrich (2019) Landforms of the Louisiana Coastal Plain, Louisiana Geological Survey, available online at: https://www.lsu.edu/lgs/publications/products/landforms_book.pdf
In [41]:
# Import needed packages
# Standard libraries
import os # for operating system manipulation
import pathlib # for managing files and directories
import pickle # enables serialization of objects for saving
import re # pattern matching in strings
import warnings # control how warnings are displayed
# Geospatial Packages
import cartopy.crs as ccrs # coordinate reference system for maps
import earthaccess # simplifies access to nasa earthdata
import earthpy as et # functions that work with raster and vector data
import geopandas as gpd # read and manipulate geo dataframes
import geoviews as gv # geospatial visualization
import hvplot.pandas # plots pandas dataframes
import hvplot.xarray # plots data arrays
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import numpy as np # numerican computation
import pandas as pd # tabular dataframes
import rioxarray as rxr # combine xarray with geospatial raster data
import rioxarray.merge as rxrmerge # merging multiple raster datasets
import seaborn as sns
from tqdm.notebook import tqdm # tracking processes with progress bar
import xarray as xr # gridded datasets
from shapely.geometry import Polygon # analyze geometric objects
from sklearn.cluster import KMeans # machine learning algorithm to group data
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_samples, silhouette_score
# Environmental Variables
os.environ["GDAL_HTTP_MAX_RETRY"] = "5" # geospatial data abstraction library for website query
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1" # combined lines try website 5 times with 1 second delay between
warnings.simplefilter('ignore') # suppress warnings
Organize Information Downloads¶
In [42]:
# Setup data directories
data_dir = os.path.join(
pathlib.Path.home(),
'Documents',
'eaclassprojects',
'clusters',
'land'
)
os.makedirs(data_dir, exist_ok=True)
data_dir
Out[42]:
'C:\\Users\\stem2\\Documents\\eaclassprojects\\clusters\\land'
Using a Decorator to Cache Results¶
In [43]:
# Define decorator
def cached(func_key, override=False):
# save function results for retrieval and allow overwrite
"""
A decorator to cache function results
Parameters
==========
key: str
File basename used to save pickled results
override: bool
When True, re-compute even if the results are already stored
"""
# Wrap caching function
def compute_and_cache_decorator(compute_function):
"""
Wrap the caching function
Parameters
==========
compute_function: function
The function to run and cache results
"""
# detail usage of func_key
def compute_and_cache(*args, **kwargs):
"""
Perform a computation and cache, or load cached result.
Parameters
==========
args
Positional arguments for the compute function
kwargs
Keyword arguments for the compute function
"""
# Add an identifier from the particular function call
if 'cache_key' in kwargs:
key = '_'.join((func_key, kwargs['cache_key']))
else:
key = func_key
path = os.path.join(
et.io.HOME, et.io.DATA_NAME, 'jars', f'{key}.pickle')
# Check if the cache exists already or override caching
if not os.path.exists(path) or override:
# Make jars directory if needed
os.makedirs(os.path.dirname(path), exist_ok=True)
# Run the compute function as the user did
result = compute_function(*args, **kwargs)
# Pickle the object
with open(path, 'wb') as file:
pickle.dump(result, file)
else:
# Unpickle the object
with open(path, 'rb') as file:
result = pickle.load(file)
return result
return compute_and_cache
return compute_and_cache_decorator
Study Site¶
In [44]:
# Read watershed boundary dataset shapefile into a GeoDataFrame
@cached('wbd_08')
def read_wbd_file(wbd_filename, huc_level, cache_key):
# Download and unzip
wbd_url = (
"https://prd-tnm.s3.amazonaws.com"
"/StagedProducts/Hydrography/WBD/HU2/Shape/"
f"{wbd_filename}.zip")
wbd_dir = et.data.get_data(url=wbd_url)
# Read desired data
wbd_path = os.path.join(wbd_dir, 'Shape', f'WBDHU{huc_level}.shp')
wbd_gdf = gpd.read_file(wbd_path, engine='pyogrio')
return wbd_gdf
huc_level = 12 # identify HUC level
wbd_gdf = read_wbd_file(
"WBD_08_HU2_Shape", huc_level, cache_key=f'hu{huc_level}') # Call the function using cache
delta_gdf = (
wbd_gdf[wbd_gdf[f'huc{huc_level}']
.isin(['080902030506'])] # filter for specific river basin
.dissolve() # create a single polygon
)
(
delta_gdf.to_crs(ccrs.Mercator()) # Reproject to Mercator for web mapping
.hvplot(
alpha=.2, fill_color='white', # set styling
tiles='EsriImagery', crs=ccrs.Mercator()) # add background map
.opts(title='Mississippi River Watershed, Live Oak, LA', width=600, height=300) # set plot dimensions
)
Out[44]:
In [45]:
wbd_gdf.head()
Out[45]:
| tnmid | metasource | sourcedata | sourceorig | sourcefeat | loaddate | referenceg | areaacres | areasqkm | states | ... | name | hutype | humod | tohuc | noncontrib | noncontr_1 | shape_Leng | shape_Area | ObjectID | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {8AFB1AF9-7296-4303-89DE-14CD073B859A} | {511D2AC8-11BA-45FC-AB98-F69D693D4C44} | Watershed Boundary Dataset (WBD) | Natural Resources and Conservation Service and... | None | 2024-08-15 | 535297,540579 | 29441.81 | 119.15 | LA | ... | Gourd Bayou-Youngs Bayou | S | LE,ID,DD | 080500011308 | 0.0 | 0.0 | NaN | NaN | 1 | POLYGON ((-92.00021 32.53586, -91.99994 32.535... |
| 1 | {916A17A6-B4A0-4FD7-9BB8-FFD1936B15B2} | {511D2AC8-11BA-45FC-AB98-F69D693D4C44} | Watershed Boundary Dataset (WBD) | Natural Resources and Conservation Service and... | None | 2024-08-15 | 535512 | 11406.67 | 46.16 | LA | ... | Hams Creek | S | ID | 080802050104 | 0.0 | 0.0 | NaN | NaN | 2 | POLYGON ((-93.37574 30.58982, -93.3747 30.5891... |
| 2 | {493C7EC1-2F1C-4B84-AFFB-6F6868A9868E} | {511D2AC8-11BA-45FC-AB98-F69D693D4C44} | Watershed Boundary Dataset (WBD) | Natural Resources and Conservation Service and... | None | 2024-08-15 | 547190,559640 | 29138.21 | 117.92 | LA | ... | Caney Creek-Bayou D'Arbonne | S | NM | 080402060503 | 0.0 | 0.0 | NaN | NaN | 3 | POLYGON ((-93.07761 32.88752, -93.07784 32.887... |
| 3 | {49A3C087-B460-4F97-9D99-78CBB675248B} | {511D2AC8-11BA-45FC-AB98-F69D693D4C44} | Watershed Boundary Dataset (WBD) | Natural Resources and Conservation Service and... | None | 2024-08-15 | 77417,78285 | 17759.39 | 71.87 | AR | ... | L'Aigle Creek-Saline River | S | NM | 080402020206 | 0.0 | 0.0 | NaN | NaN | 4 | POLYGON ((-92.08947 33.29383, -92.0897 33.2938... |
| 4 | {0FB41498-11EA-4AB1-AF05-E2A8E5E2E274} | {511D2AC8-11BA-45FC-AB98-F69D693D4C44} | Watershed Boundary Dataset (WBD) | Natural Resources and Conservation Service and... | None | 2024-08-15 | 1628466 | 98564.62 | 398.88 | LA | ... | West Cote Blanche Bay | W | NM | 080801030800 | 0.0 | 0.0 | NaN | NaN | 5 | POLYGON ((-91.62408 29.73947, -91.62195 29.737... |
5 rows × 21 columns
Access Multispectral Data¶
In [46]:
# Log in to earthaccess
earthaccess.login(persist=True)
# Search for HLS tiles
results = earthaccess.search_data(
short_name="HLSL30",
cloud_hosted=True,
bounding_box=tuple(delta_gdf.total_bounds),
temporal=("2023-05", "2023-09"),
)
# Confirm the contents
num_granules =len(results)
print(f"Number of granules found: {num_granules}")
print(results[0])
Number of granules found: 76
Collection: {'EntryTitle': 'HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -90.07372506, 'Latitude': 28.80444525}, {'Longitude': -89.57473392, 'Latitude': 28.81489352}, {'Longitude': -89.30816409, 'Latitude': 29.81030157}, {'Longitude': -90.10353675, 'Latitude': 29.79400715}, {'Longitude': -90.07372506, 'Latitude': 28.80444525}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-05-04T16:31:32.101Z', 'EndingDateTime': '2023-05-04T16:31:55.992Z'}}
Size(MB): 87.65197467803955
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B07.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B06.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B01.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.SZA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.SAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B05.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.VAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B10.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B09.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B11.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.VZA.tif']
In [47]:
results = earthaccess.search_data(
short_name="HLSL30",
cloud_hosted=True,
bounding_box=tuple(delta_gdf.total_bounds),
temporal=("2023-05", "2023-09"),
)
print(type(results)) # Prints the type of the 'results' object
<class 'list'>
Compile Information about each Granule¶
In [48]:
# Extract and organize Earthaccess data
def get_earthaccess_links(results):
url_re = re.compile(
r'\.(?P<tile_id>\w+)\.\d+T\d+\.v\d\.\d\.(?P<band>[A-Za-z0-9]+)\.tif')
# Loop through each granule
link_rows = [] # Initialize a list to hold GeoDataFrames
url_dfs = [] # Initialize a list to hold individual file data
for granule in tqdm(results):
# Get granule metadata information
info_dict = granule['umm']
granule_id = info_dict['GranuleUR']
datetime = pd.to_datetime(
info_dict
['TemporalExtent']['RangeDateTime']['BeginningDateTime'])
points = ( # Extract polygon coordinates
info_dict
['SpatialExtent']['HorizontalSpatialDomain']['Geometry']['GPolygons'][0]
['Boundary']['Points'])
geometry = Polygon( # Create a Shapely Polygon object
[(point['Longitude'], point['Latitude']) for point in points])
# Get data files associated with the granule
files = earthaccess.open([granule])
# Loop through each file within the granule
for file in files:
match = url_re.search(file.full_name)
if match is not None:
# Create a GeoDataFrame for the current file and append to list
link_rows.append(
gpd.GeoDataFrame(
dict(
datetime=[datetime],
tile_id=[match.group('tile_id')], # Extract tile ID
band=[match.group('band')], # Extract band name
url=[file], # Store the file object
geometry=[geometry] # Store the geometry
),
crs="EPSG:4326" # set coordinate reference system
)
)
# Concatenate GeoDataFrames into a single gdf
file_df = pd.concat(link_rows).reset_index(drop=True) # combine all rows
return file_df
results = earthaccess.search_data(
short_name="HLSL30",
cloud_hosted=True,
bounding_box=tuple(delta_gdf.total_bounds),
temporal=("2023-05", "2023-09"),
)
if results: # Check if results is not empty
first_result = results[0] # Get the first result
file_df = get_earthaccess_links([first_result]) # Pass a list containing only the first result to the function
print(file_df.head())
else:
print("No results found.")
0%| | 0/1 [00:00<?, ?it/s]
QUEUEING TASKS | : 0%| | 0/15 [00:00<?, ?it/s]
PROCESSING TASKS | : 0%| | 0/15 [00:00<?, ?it/s]
COLLECTING RESULTS | : 0%| | 0/15 [00:00<?, ?it/s]
datetime tile_id band \
0 2023-05-04 16:31:32.101000+00:00 T16RBT B03
1 2023-05-04 16:31:32.101000+00:00 T16RBT B07
2 2023-05-04 16:31:32.101000+00:00 T16RBT B06
3 2023-05-04 16:31:32.101000+00:00 T16RBT B02
4 2023-05-04 16:31:32.101000+00:00 T16RBT B01
url \
0 <File-like object HTTPFileSystem, https://data...
1 <File-like object HTTPFileSystem, https://data...
2 <File-like object HTTPFileSystem, https://data...
3 <File-like object HTTPFileSystem, https://data...
4 <File-like object HTTPFileSystem, https://data...
geometry
0 POLYGON ((-90.07373 28.80445, -89.57473 28.814...
1 POLYGON ((-90.07373 28.80445, -89.57473 28.814...
2 POLYGON ((-90.07373 28.80445, -89.57473 28.814...
3 POLYGON ((-90.07373 28.80445, -89.57473 28.814...
4 POLYGON ((-90.07373 28.80445, -89.57473 28.814...
Open, Crop, and Mask Data using a Cached Decorator¶
In [49]:
@cached('delta_reflectance_da_df') # Function output stored, same input gives cached output improving performance
def compute_reflectance_da(search_results, boundary_gdf):
"""
Connect to files over VSI, crop, cloud mask, and wrangle
Returns a single reflectance DataFrame
with all bands as columns and
centroid coordinates and datetime as the index.
Parameters
==========
file_df : pd.DataFrame
File connection and metadata (datetime, tile_id, band, and url)
boundary_gdf : gpd.GeoDataFrame
Boundary use to crop the data
"""
def open_dataarray(url, boundary_proj_gdf, scale=1, masked=True):
# Open masked DataArray
da = rxr.open_rasterio(url, masked=masked).squeeze() * scale
# Reproject boundary if needed
if boundary_proj_gdf is None:
boundary_proj_gdf = boundary_gdf.to_crs(da.rio.crs)
# Crop
cropped = da.rio.clip_box(*boundary_proj_gdf.total_bounds)
return cropped
def compute_quality_mask(da, mask_bits=[1, 2, 3]):
"""Mask out low quality data by bit"""
# Unpack bits into a new axis
bits = (
np.unpackbits(
da.astype(np.uint8), bitorder='little'
).reshape(da.shape + (-1,))
)
# Select the required bits and check if any are flagged
mask = np.prod(bits[..., mask_bits]==0, axis=-1)
return mask
file_df = get_earthaccess_links(search_results)
granule_da_rows= []
boundary_proj_gdf = None
# Loop through each image
group_iter = file_df.groupby(['datetime', 'tile_id'])
for (datetime, tile_id), granule_df in tqdm(group_iter):
print(f'Processing granule {tile_id} {datetime}')
# Open granule cloud cover
cloud_mask_url = (
granule_df.loc[granule_df.band=='Fmask', 'url']
.values[0])
cloud_mask_cropped_da = open_dataarray(cloud_mask_url, boundary_proj_gdf, masked=False)
# Compute cloud mask
cloud_mask = compute_quality_mask(cloud_mask_cropped_da)
# Loop through each spectral band
da_list = []
df_list = []
for i, row in granule_df.iterrows():
if row.band.startswith('B'):
# Open, crop, and mask the band
band_cropped = open_dataarray(
row.url, boundary_proj_gdf, scale=0.0001)
band_cropped.name = row.band
# Add the DataArray to the metadata DataFrame row
row['da'] = band_cropped.where(cloud_mask)
granule_da_rows.append(row.to_frame().T)
# Reassemble the metadata DataFrame
return pd.concat(granule_da_rows)
reflectance_da_df = compute_reflectance_da(results, delta_gdf)
reflectance_da_df.head()
Out[49]:
| datetime | tile_id | band | url | geometry | da | |
|---|---|---|---|---|---|---|
| 45 | 2024-06-07 16:31:11.509000+00:00 | T15RYN | B04 | <File-like object HTTPFileSystem, https://data... | POLYGON ((-89.82661214 28.80213717, -89.795837... | [[<xarray.DataArray 'B04' ()> Size: 4B\narray(... |
| 48 | 2024-06-07 16:31:11.509000+00:00 | T15RYN | B06 | <File-like object HTTPFileSystem, https://data... | POLYGON ((-89.82661214 28.80213717, -89.795837... | [[<xarray.DataArray 'B06' ()> Size: 4B\narray(... |
| 49 | 2024-06-07 16:31:11.509000+00:00 | T15RYN | B03 | <File-like object HTTPFileSystem, https://data... | POLYGON ((-89.82661214 28.80213717, -89.795837... | [[<xarray.DataArray 'B03' ()> Size: 4B\narray(... |
| 50 | 2024-06-07 16:31:11.509000+00:00 | T15RYN | B07 | <File-like object HTTPFileSystem, https://data... | POLYGON ((-89.82661214 28.80213717, -89.795837... | [[<xarray.DataArray 'B07' ()> Size: 4B\narray(... |
| 51 | 2024-06-07 16:31:11.509000+00:00 | T15RYN | B02 | <File-like object HTTPFileSystem, https://data... | POLYGON ((-89.82661214 28.80213717, -89.795837... | [[<xarray.DataArray 'B02' ()> Size: 4B\narray(... |
Merge and Composite Data¶
In [51]:
@cached('delta_reflectance_da') # Function output stored, same input gives cached output improving performance
def merge_and_composite_arrays(granule_da_df):
# Merge and composite and image for each band
df_list = [] # List to store DataFrames
da_list = [] # List to store the composited DataArrays for each band
for band, band_df in tqdm(granule_da_df.groupby('band')): # Iterate through each band
merged_das = [] # List to store merged DAs for each date within the current band
for datetime, date_df in tqdm(band_df.groupby('datetime')): # Iterate through each date within the current band
# Merge granules for each date
merged_da = rxrmerge.merge_arrays(list(date_df.da))
# Mask negative values
merged_da = merged_da.where(merged_da>0)
# Add the merged DA for the current date to the list
merged_das.append(merged_da)
# Composite images across dates with median reflectance value for each pixel reducing the effect of cloud cover.
composite_da = xr.concat(merged_das, dim='datetime').median('datetime')
composite_da['band'] = int(band[1:]) # Converts band name to integer
composite_da.name = 'reflectance' # Sets the name of the DataArray
da_list.append(composite_da) # Add the composite DA for the current band to the list
# Concatenate the composite DataArrays for all bands into a single DataArray
return xr.concat(da_list, dim='band')
reflectance_da = merge_and_composite_arrays(reflectance_da_df) # Call the function to process the reflectance data
reflectance_da[0] # Display the first resulting reflectance DataArray
Out[51]:
<xarray.DataArray 'reflectance' (y: 556, x: 624)> Size: 1MB
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
* x (x) float64 5kB 7.926e+05 7.926e+05 ... 8.112e+05 8.113e+05
* y (y) float64 4kB 3.304e+06 3.304e+06 ... 3.287e+06 3.287e+06
band int64 8B 1
spatial_ref int64 8B 0Analyze using K-MEANS¶
In [58]:
### how many different types of vegetation are there?
# Convert spectral DataArray to DataFrame
model_df = reflectance_da.to_dataframe().reflectance.unstack('band')
model_df = model_df.drop(columns=[10, 11]).dropna()
model_df.head()
Out[58]:
| band | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| y | x | ||||||||
| 3.303783e+06 | 810148.062907 | 0.01560 | 0.0225 | 0.0409 | 0.0366 | 0.0478 | 0.0281 | 0.0181 | 0.0006 |
| 810178.062907 | 0.01895 | 0.0256 | 0.0396 | 0.0413 | 0.0426 | 0.0284 | 0.0220 | 0.0006 | |
| 810208.062907 | 0.01915 | 0.0246 | 0.0387 | 0.0377 | 0.0384 | 0.0273 | 0.0236 | 0.0007 | |
| 810238.062907 | 0.02040 | 0.0247 | 0.0440 | 0.0445 | 0.0629 | 0.0418 | 0.0266 | 0.0007 | |
| 810268.062907 | 0.01815 | 0.0245 | 0.0437 | 0.0444 | 0.0618 | 0.0397 | 0.0259 | 0.0008 |
In [59]:
# Running the fit and predict functions at the same time.
# We can do this since we don't have target data.
prediction = KMeans(n_clusters=6).fit_predict(model_df.values)
# Add the predicted values back to the model DataFrame
model_df['clusters'] = prediction
model_df
Out[59]:
| band | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | clusters | |
|---|---|---|---|---|---|---|---|---|---|---|
| y | x | |||||||||
| 3.303783e+06 | 810148.062907 | 0.01560 | 0.0225 | 0.0409 | 0.0366 | 0.0478 | 0.0281 | 0.0181 | 0.0006 | 0 |
| 810178.062907 | 0.01895 | 0.0256 | 0.0396 | 0.0413 | 0.0426 | 0.0284 | 0.0220 | 0.0006 | 0 | |
| 810208.062907 | 0.01915 | 0.0246 | 0.0387 | 0.0377 | 0.0384 | 0.0273 | 0.0236 | 0.0007 | 0 | |
| 810238.062907 | 0.02040 | 0.0247 | 0.0440 | 0.0445 | 0.0629 | 0.0418 | 0.0266 | 0.0007 | 4 | |
| 810268.062907 | 0.01815 | 0.0245 | 0.0437 | 0.0444 | 0.0618 | 0.0397 | 0.0259 | 0.0008 | 4 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3.287163e+06 | 793798.062907 | 0.02650 | 0.0345 | 0.0548 | 0.0427 | 0.0218 | 0.0098 | 0.0074 | 0.0007 | 0 |
| 793828.062907 | 0.02790 | 0.0351 | 0.0549 | 0.0439 | 0.0221 | 0.0104 | 0.0076 | 0.0008 | 0 | |
| 793858.062907 | 0.02580 | 0.0331 | 0.0534 | 0.0419 | 0.0194 | 0.0080 | 0.0059 | 0.0009 | 0 | |
| 793888.062907 | 0.02570 | 0.0326 | 0.0521 | 0.0402 | 0.0182 | 0.0064 | 0.0046 | 0.0007 | 0 | |
| 793918.062907 | 0.02550 | 0.0340 | 0.0541 | 0.0423 | 0.0199 | 0.0083 | 0.0060 | 0.0007 | 0 |
318248 rows × 9 columns
Plot Reflectance¶
In [60]:
# Select bands from reflectance data array
rgb = reflectance_da.sel(band=[4, 3, 2]) # band 4=red, 3=green and 2=blue
rgb_uint8 = (rgb * 255).astype(np.uint8).where(rgb!=np.nan) # convert to integers (0-255)
rgb_bright = rgb_uint8 * 10 # increase brightness
rgb_sat = rgb_bright.where(rgb_bright < 255, 255) # no pixel exceeds 255
# Create a composite RGB image plot in square
(
rgb_sat.hvplot.rgb(
x='x', y='y', bands='band',
data_aspect=1,
xaxis=None, yaxis=None)
+ # Overlay cluster data
model_df.clusters.to_xarray().sortby(['x', 'y']).hvplot(
cmap="accent", aspect='equal', xaxis=None, yaxis=None)
)
Out[60]:
Landcover Analysis using K-Means Clusters from Sentinel/Landsat Multispectral Data¶
Land Cover Interpretation based on Spectral Data¶
According to America’s Watershed Initiative, the wetlands in the lower Mississippi region being studied have been depleted annually since the 1930s and excess nutrient discharges have created "dead zones" or areas of low oxygen where life struggles to exist. [1] The K-means cluster analysis in the above images shows 6 clusters of land forms spread out over the region. Some, like clusters, 4 and 0 have areas where they are concentrated, but most landform clusters are highly dispersed. In a study by Roy et al, their " ...data suggests that single transition land loss is caused by wave-edge erosion, whereas multiple transition land loss is caused by subsidence." [2] Given that most of the landform clusters are fragmented, I would expect clusters 4 and 0 to be mostly aquatic and the remainder to be composed of grasses, sedges and rushes.¶
Source¶
- America’s Watershed Initiative Report Card for the Mississippi River. Dec. 2015. Available online at: https://americaswatershed.org/wp-content/uploads/2015/12/Mississippi-River-Report-Card-Methods-v10.1.pdf
- Samapriya Roy et al. 2020. Spatial and temporal patterns of land loss in the Lower Mississippi River Delta from 1983 to 2016. Remote Sensing of Environment 250 (2020) 112046. Available online at: https://www.sciencedirect.com/science/article/abs/pii/S0034425720304168